Rule-Based Routing for Fault-Tolerant Parallel Computers
نویسندگان
چکیده
Routing has an important influence on the performance of interconnection networks in parallel computers. Besides simple oblivious schemes like xy-routing for 2D grids there exist a lot of sophisticated adaptive and fault-tolerant routing algorithms which could however not be implemented so far, because there are no fast hardware routers which are able to support them. In this paper such a flexible and programmable router design is proposed which is based on a rulebased representation of routing algorithms. By making use of the ARON method a fast and efficient hardware rule interpreter can be implemented. The basic principle of rule-based routers is discussed taking typical adaptive routing algorithms as examples. The structure of a prototype is also presented. The influence of increased routing decision time by the rule interpreter is studied by means of simulations.
منابع مشابه
Design and Evaluation of a Fault-Tolerant Adaptive Router for Parallel Computers
In this paper, we propose a design methodology for faulttolerant adaptive routers for parallel and distributed computers. The key idea of our method is integrating minimal and non-minimal routing that is supported by independent virtual channels (VCs). Distinguishing the routing functions for each set of VCs simplifies the design of fault-tolerant algorithms. After describing the method, we sho...
متن کاملFault-tolerant Design for Multistage Routing Networks
International Symposium on Shared Memory Multiprocessing, 1991 20 Fault-Tolerant Design for Multistage Routing Networks Andr e DeHon, Thomas Knight Jr., and Henry Minsky As the size of digital systems increases, the average length of time between single component failures diminishes. To avoid component related failures, large computers must be fault-tolerant; that is, the computer must perform ...
متن کاملA Compact Fault-tolerant, Deadlock-free, Minimal Routing Algorithm for n-Dimensional Wormhole Switching Based Meshes
To satisfy the ever increasing demand for computational power, massively parallel computers are mandatory. Such machines require a compact routing scheme for scalability as well as an optimal routing algorithm to minimize the communication delay. Combination of both requirements is reported for fault-free regular structures only . Yet in massively parallel machines the probability of failures b...
متن کاملReachability-Based Fault-Tolerant Routing1
Currently, clusters of PCs are being used as a costeffective alternative to large parallel computers. In most of them it is critical to keep the system running even in the presence of faults. As the number of nodes increases in these systems, the interconnection network grows accordingly. Along with the increase in components the probability of faults increases dramatically, and thus, fault-tol...
متن کاملDAMQ-Based Approach for Efficiently Using the Buffer Spaces of a NoC Router
In this paper we present high performance dynamically allocated multi-queue (DAMQ) buffer schemes for fault tolerance systems on chip applications that require an interconnection network. Two virtual channels shared the same buffer space. Fault tolerant mechanisms for interconnection networks are becoming a critical design issue for large massively parallel computers. It is also important to hi...
متن کامل